
clear

*From https://transition.fcc.gov/oet/info/maps/census/fips/fips.txt
insheet using "statefips_abbreviations.csv"

drop in 1
drop in 1

rename v2 stabbrev
rename v3 stfips
replace stfips = "0" + stfips if strlen(stfips)==1

save "statefips_abbreviations.dta", replace

clear

forvalues i = 1(1)5{
insheet using zipcty`i'.txt

drop in 1
gen homezip = substr(v1,1,5)
gen homecty = substr(v1,26,3)
gen homecty_name = substr(v1,29,.)
gen stabbrev = substr(v1,24,2)

merge m:1 stabbrev using "statefips_abbreviations.dta"
rename _merge mergestabbrev

gen homectyfips = stfips + homecty

save zipcty`i'.dta, replace
clear
}

forvalues i = 6(1)10{
insheet using zipcty`i'.txt

drop in 1
gen homezip = substr(v1,1,5)
gen homecty = substr(v1,26,3)
gen homecty_name = substr(v1,29,.)
gen stabbrev = substr(v1,24,2)

merge m:1 stabbrev using "statefips_abbreviations.dta"
rename _merge mergestabbrev

gen homectyfips = stfips + homecty

save zipcty`i'.dta, replace
clear
}

use zipcty1.dta
forvalues i = 2(1)10{
append using zipcty`i'.dta
}

rename homezip homezipstr

destring homectyfips, replace

save allzipcty.dta, replace

clear
use "justnormasylum.dta"
ren cty_fips homectyfips
merge 1:m homectyfips using allzipcty.dta
rename _merge mergehomenorm

local vars hasnormalorasylum hasnormalschool
foreach x of local vars{
ren `x' home_`x'
}

save zipmerge.dta, replace

clear 
use zipmerge.dta

keep homectyfips home_hasnormalorasylum home_hasnormalschool homezipstr homecty_name
drop if homezipstr==""

egen taghomezipcty = tag(homezipstr homectyfips)
keep if taghomezipcty==1

*Only keep the zip-county pairs that merge to normal school or asylum counties

keep if home_hasnormalorasylum==1

duplicates tag homezipstr, gen(duphomezip)

bysort homezipstr: egen maxdup = max(duphomezip)

tab maxdup

distinct homezipstr if maxdup>=1 & maxdup~=.
distinct homezipstr

bysort homezipstr: egen maxhasnorm = max(home_hasnormalschool)
bysort homezipstr: egen minhasnorm = min(home_hasnormalschool)

tab home_hasnormalschool, missing

save zipcty_norm_allmatches.dta, replace

*Merge these zip-county pairs to the individual level data based on the individual's home zip code

clear
use choice_demog.dta

*Merge people to all the counties associated with their zip code
joinby homezipstr using zipcty_norm_allmatches.dta, unmatched(master)
tab _merge
rename _merge joinbyzip

*Only keep counties that are normal school or asylum counties (this will drop the places from the 
*using dataset that don't merge to the master dataset)

keep if home_hasnormalorasylum==1

save joinbyzip.dta, replace

clear
use joinbyzip.dta

*Merge to the university county from IPEDS
merge m:1 acerecode year using TFS_unitid_aceyear.dta
rename _merge mergepubaceyearziprobust

*Note that the TFS_unitid_aceyear dataset includes years starting in 1971, but the master
*data starts in 1982 because that is when home zip code starts being available.  So some of the
*observations with mergepubaceyear = 2 are because they are in years before 1982.  Also,
*home zip code is not included in the data in 1994, so that will be missing from the master data

drop if mergepubaceyearziprobust== 2 & (year<=1981|year==1994)

*Appendix E text statistic: observations with TFS code-year pairs that 
*are not in the restricted-access TFS data

tab mergepubaceyearziprobust

egen tagaceyeartest = tag(acerecode year)

tab tagaceyeartest if mergepubaceyearziprobust==1

tab unitid if mergepubaceyearziprobust==2

*Only very few universities in the using data (based on institutions in the restricted access data), but not in 
*the master (based on people in the public access data).

*Drop the observations with mergepubaceyear==2 since these are at the university year level, not the individual level,
*and we are focusing on the individual level data

drop if mergepubaceyearziprobust==2

*Check if homectyfips equals instctyfips for any of the matches

tostring homectyfips, gen(homectyfipsstr)
replace homectyfipsstr = "0" + homectyfipsstr if strlen(homectyfipsstr)==4
gen homestate = substr(homectyfipsstr, 1, 2)
destring homestate, replace

gen homeinstsamecty = cty_fips == homectyfips

gen homezipsameasinstzip = homezipstr == instzip

local vars homeinstsamecty homezipsameasinstzip  
foreach x of local vars{
bysort subjid year acerecode: egen max`x' = max(`x')
}

tab home_hasnormalorasylum, missing
duplicates report subjid year acerecode, gen(dupid)
egen tagsubj = tag(subjid year acerecode)
bysort subjid year acerecode: egen mintag = min(tagsubj)
tab mintag if tagsubj==1

*There will be multiple observations per person if the person's home zip code matched to multiple counties

save TFS_allctiesperstud.dta, replace

*Get a dataset only with the people who had their data merged to the restricted data, but were not in IPEDS roster
*For these people will use instzip from restricted data

*Appendix E text statistics, third paragraph:  observations from the public-access data for 
*whom we do not obtain county FIPS or other IPEDS variables for their 
*university (get fraction of total by taking total from first tab command below and divide by 
*total from second tab command below)

tab mergepubaceyearziprobust if cty_fips==. & tagsubj==1, missing
tab mergepubaceyearziprobust if tagsubj==1, missing
tab mergepubaceyearziprobust mergeinstctyname if cty_fips==. & tagsubj==1, missing

*Observations that are not in the IPEDS roster
tab mergepubaceyearziprobust mergeinstctyname if cty_fips==. & tagsubj==1 & unitid~=., missing

*The observations with missing unitid are those that do not have unitid in the restricted data
tab unitid if cty_fips==. & tagsubj==1 & mergepubaceyearziprobust==3 & mergeinstctyname==1, missing

keep if mergepubaceyearziprobust==3 & cty_fips==.
save restricted_noctyfips.dta, replace

clear 
use TFS_allctiesperstud.dta

*Construct a dataset with one observation per individual, with just one home county
*For most people they already only have one home county, but for a few people their zip merged to multiple counties

*If their zip merged to a normal school and an asylum county, keep the normal school county observation--want to treat 
*them as having grown up in a normal school county since they did in part

*Appendix E text statistic 1st paragraph: count number of dropped observations below 
*(merge to both a normal school and asylum county) 
*relative to total number of students above (total from tab mintag if tagsubj==1)
drop if maxhasnorm~=minhasnorm & home_hasnormalschool==0

*If an individual merged to multiple normal school counties or multiple asylum counties, keep the county where they 
*went to university if they did attend university in one of them.

gen homectyfipsU = homectyfips if homectyfips==cty_fips
bysort subjid year acerecode: egen maxhomectyfipsU = max(homectyfipsU)

gen homeinstsamectyn = cty_fips == homectyfips
replace homeinstsamectyn = . if cty_fips==.|homectyfips==.
local vars homeinstsamectyn
foreach x of local vars{
bysort subjid year acerecode: egen max`x' = max(`x')
}

egen tagsubjn = tag(subjid year acerecode)

gen homectyfipswmax = homectyfips
replace homectyfipswmax = maxhomectyfipsU if tagsubjn==1 & maxhomeinstsamectyn==1

bysort subjid year acerecode: egen mintagn = min(tagsubjn)

*Appendix E text statistic 1st paragraph: number in multiple normal school or multiple asylum counties
tab mintagn if tagsubjn==1

*If individuals didn't attend university in any of the home counties they merge to, 
*just choose one (the one with tagsubjn=1).

keep if tagsubjn==1

drop homectyfips homeinstsamecty
rename homectyfipswmax homectyfips 
rename maxhomeinstsamectyn homeinstsamecty

count
save choice_demog_merge_normasylum.dta, replace


capture erase choice_demog.dta




